BitMat – Scalable Indexing and Querying of Large RDF Graphs

نویسندگان

  • Medha Atre
  • Vineet Chaoji
  • Mohammed J. Zaki
  • James A. Hendler
چکیده

The growing size of Semantic Web data expressed in the form of Resource Description Framework (RDF) has made it necessary to develop effective ways of storing this data to save space and to query it in a scalable manner. SPARQL – the query language for RDF data – closely follows SQL syntax. As a natural consequence most of the RDF storage and querying engines are based on modern database storage and query optimization techniques. Previous work has tried to use vertical partitioning using column stores (C-Store, MonetDB) and 6-way indexing (RDF-3X, Hexastore) for storage and querying of RDF data. Although these approaches perform well for highly selective queries, for queries having low-selectivity triple patterns, scalability of the querying method and optimizations still remain a challenge. In this paper we present a new way of storing RDF graphs in run-length-encoded bit-vector format called BitMat, and we propose a novel two-phase SPARQL join query processing algorithm. In the first phase it prunes the candidate RDF triples, and in the next phase, it stitches the pruned RDF triples together to generate final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. Our evaluation shows that BitMat not only provides an efficient method of storage of the RDF graphs, but our join query processing algorithm scales well for low-selectivity join queries, where state-of-the-art RDF query processors face problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BitMat: An In-core RDF Graph Store for Join Query Processing

With the growing size of RDF data sources, the need for a compact representation providing efficient query interface has become compelling. In this paper, we introduce BitMat, a main memory based compressed bit-matrix structure. The key aspects of BitMat are as follows: i) its RDF graph representation is very compact compared to the conventional disk-based and existing main-memory RDF stores, a...

متن کامل

BitMat: A Main Memory RDF Triple Store

BitMat is a main memory based bit-matrix structure for representing a large set of RDF triples, designed primarily to allow processing of conjunctive triple pattern (join) queries. The key aspects are as follows: i) its RDF triple-set representation is compact compared to conventional disk-based and existing main-memory RDF stores, ii) basic join query processing employs logical bitwise AND/OR ...

متن کامل

Scalable Semantics - The Silver Lining of Cloud Computing

Semantic inferencing and querying across largescale RDF triple stores is notoriously slow. Our objective is to expedite this process by employing Google’s MapReduce framework to implement scale-out distributed querying and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal appro...

متن کامل

Dynamic Querying of Mass-Storage RDF Data with Rule-Based Entailment Regimes

RDF Schema (RDFS) as a lightweight ontology language is gaining popularity and, consequently, tools for scalable RDFS inference and querying are needed. SPARQL has become recently a W3C standard for querying RDF data, but it mostly provides means for querying simple RDF graphs only, whereas querying with respect to RDFS or other entailment regimes is left outside the current specification. In t...

متن کامل

A Graph-based Approach to Indexing Semantic Web Data

To the best of our knowledge, existing Semantic Web (SW) search systems fail to index RDF graph structures as graphs. They either do not index graph structures and retrieve them by run-time formal queries, or index all row triples from the back-end repositories. This increases the overhead of indexing for very large RDF documents. Moreover, the graph explorations from row triples can be complic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011